Academic Torrents: Scalable Data Distribution
نویسندگان
چکیده
As competitions get more popular, transferring ever-larger data sets becomes infeasible and costly. For example, downloading the 157.3 GB 2012 ImageNet data set incurs about $4.33 in bandwidth costs per download. Downloading the full ImageNet data set takes 33 days. ImageNet has since become popular beyond the competition, and many papers and models now revolve around this data set. For sharing such an important resource to the machine learning community, the sharers of ImageNet must shoulder a large bandwidth burden. Academic Torrents reduces this burden for disseminating competition data, and also increases download speeds for end users . By augmenting an existing HTTP server with a peer-to-peer swarm, requests get re-routed to get data from downloaders. While existing systems slow down with more users, the benefits of Academic Torrents grow, with noticeable effects even when only one other person is downloading.
منابع مشابه
Dynamic swarm management for improved BitTorrent performance
BitTorrent is a very scalable file sharing protocol that utilizes the upload bandwidth of peers to offload the original content source. With BitTorrent, each file is split into many small pieces, each of which may be downloaded from different peers. While BitTorrent allows peers to effectively share pieces in systems with sufficient participating peers, the performance can degrade if participat...
متن کاملA Torrent Recommender based on DHT Crawling
The DHT Mainline is a significant extension to the BitTorrent protocol. The DHT Mainline has several million users and is the largest DHT network. This thesis uses the DHT Mainline to generate a recommendation system for torrents. A program was written crawling the entirety of the torrent search engine kickass.to gathering metadata about torrents. The DHT Mainline was then crawled to search for...
متن کاملThe Pirate Bay Torrent Analysis and Visualization
Using C# as a parser, we process about 3.4 million pieces of data over 680 thousand torrents from thepiratebay.org, and create a graphical representation of the data by infographic. Info-graphic presents the information in an easily readable format, and also can be distributed across many webmediums. Based on the representation/analysis of the data, we are able to determine some interesting cha...
متن کاملAccess control in ultra-large-scale systems using a data-centric middleware
The primary characteristic of an Ultra-Large-Scale (ULS) system is ultra-large size on any related dimension. A ULS system is generally considered as a system-of-systems with heterogeneous nodes and autonomous domains. As the size of a system-of-systems grows, and interoperability demand between sub-systems is increased, achieving more scalable and dynamic access control system becomes an im...
متن کاملAngling for Big Fish in BitTorrent
BitTorrent piracy is at the core of fierce debates around network neutrality. Most of the legal actions against BitTorrent exchanges are targeted toward torrent indexing sites and trackers. Surprisingly, little is known about the initial seeds that insert contents on BitTorrent and about the highly active peers that are present in a large number of torrents. The main reason is that acquiring th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1603.04395 شماره
صفحات -
تاریخ انتشار 2016